Benchmark Report

All Results (Table)

Benchmark PPM # Samples Wall User Kernel nvprof nvprof Kernel nvprof H->D nvprof D->H Max RSS Peak Allocation
idioms.assign serial 14 3.37µs 3.29µs None N/A N/A N/A N/A N/A 78.12KiB
idioms.assign cuda 910 264.20µs 261.34µs 5.14µs 15.04µs 2.51µs N/A 6.76µs 156.54MiB 78.12KiB
idioms.assign sycl 16514 30.64µs 28.29µs 2.42µs N/A N/A N/A N/A 143.38MiB 78.12KiB
idioms.pointwise serial 1734 3.66µs 3.44µs 195.50ns N/A N/A N/A N/A N/A 78.12KiB
idioms.pointwise cuda 846 320.20µs 319.21µs 6.94µs 24.22µs 3.09µs 8.27µs 6.75µs 156.37MiB 78.12KiB
idioms.pointwise sycl 34432 271.69µs 153.81µs 118.05µs N/A N/A N/A N/A 144.48MiB 78.12KiB
idioms.reduction serial 4 8.90µs 9.00µs None N/A N/A N/A N/A N/A 8.00B
idioms.reduction cuda 266 400.38µs 257.56µs 148.76µs 22.96µs 12.43µs N/A 1.12µs 154.35MiB 8.00B
suites.polybench.2mm serial 4 9.60ms 9.60ms None N/A N/A N/A N/A N/A 1.19MiB
suites.polybench.2mm cuda 113 1.23ms 1.23ms 4.47µs 942.97µs 712.28µs 174.42µs 20.38µs 157.93MiB 1.00MiB
suites.polybench.2mm sycl 17351 450.78µs 449.69µs 1.25µs N/A N/A N/A N/A 159.22MiB 1.00MiB
suites.polybench.3mm serial 4 15.37ms 15.37ms None N/A N/A N/A N/A N/A 1.72MiB
suites.polybench.3mm cuda 82 1.59ms 1.58ms 10.40µs 1.31ms 1.08ms 175.20µs 19.07µs 157.95MiB 1.28MiB
suites.polybench.3mm sycl 30738 238.26µs 237.49µs 1.01µs N/A N/A N/A N/A 165.64MiB 1.28MiB
suites.polybench.adi serial 4 85.26ms 85.26ms None N/A N/A N/A N/A N/A 1.22MiB
suites.polybench.adi cuda 5 35.15ms 35.17ms None 34.77ms 34.70ms 27.83µs 26.42µs 159.67MiB 312.50KiB
suites.polybench.adi sycl 294 32.37ms 32.33ms 38.58µs N/A N/A N/A N/A 158.11MiB 312.50KiB
suites.polybench.atax serial 6 167.25µs 167.17µs None N/A N/A N/A N/A N/A 1.16MiB
suites.polybench.atax cuda 262 579.63µs 585.31µs 717.56ns 275.89µs 147.20µs 112.61µs 1.20µs 157.16MiB 1.16MiB
suites.polybench.atax sycl 37773 141.38µs 140.66µs 1.00µs N/A N/A N/A N/A 167.45MiB 1.16MiB
suites.polybench.bicg serial 4 34.03µs 34.00µs None N/A N/A N/A N/A N/A 135.16KiB
suites.polybench.bicg cuda 585 393.32µs 377.36µs 19.42µs 142.01µs 96.08µs 27.89µs 11.23µs 156.30MiB 135.16KiB
suites.polybench.bicg sycl 37944 126.98µs 121.31µs 5.91µs N/A N/A N/A N/A 171.54MiB 135.16KiB
suites.polybench.cholesky serial 7 1.09ms 1.09ms None N/A N/A N/A N/A N/A 312.50KiB
suites.polybench.cholesky cuda 51 2.52ms 2.51ms 75.12µs 2.19ms 2.13ms 27.63µs 25.41µs 157.48MiB 312.50KiB
suites.polybench.correlation serial 4 6.87ms 6.87ms None N/A N/A N/A N/A N/A 812.05KiB
suites.polybench.correlation cuda 134 1.09ms 1.09ms 12.31µs 822.53µs 660.26µs 38.27µs 112.29µs 159.73MiB 812.05KiB
suites.polybench.correlation sycl 17039 545.98µs 545.35µs 941.19ns N/A N/A N/A N/A 160.69MiB 812.05KiB
suites.polybench.covariance serial 4 6.74ms 6.73ms 4.25µs N/A N/A N/A N/A N/A 871.03KiB
suites.polybench.covariance cuda 126 1.12ms 1.12ms 6.90µs 783.34µs 694.24µs 40.08µs 43.38µs 159.02MiB 871.03KiB
suites.polybench.covariance sycl 30592 304.30µs 301.59µs 3.06µs N/A N/A N/A N/A 170.75MiB 871.03KiB
suites.polybench.deriche serial 6 2.01ms 2.01ms None N/A N/A N/A N/A N/A 3.71MiB
suites.polybench.deriche cuda 150 957.04µs 955.54µs 8.39µs 509.43µs 349.02µs 80.27µs 74.54µs 158.28MiB 1.85MiB
suites.polybench.deriche sycl 24935 373.80µs 372.95µs 1.24µs N/A N/A N/A N/A 163.73MiB 1.85MiB
suites.polybench.doitgen serial 4 7.27ms 7.27ms None N/A N/A N/A N/A N/A 1.41MiB
suites.polybench.doitgen cuda 74 1.80ms 1.75ms 55.03µs 1.23ms 958.44µs 143.00µs 126.53µs 157.27MiB 1.41MiB
suites.polybench.doitgen sycl 19005 449.91µs 448.77µs 1.42µs N/A N/A N/A N/A 164.40MiB 1.41MiB
suites.polybench.durbin serial 185 110.10µs 110.11µs None N/A N/A N/A N/A N/A 9.38KiB
suites.polybench.durbin cuda 8 23.41ms 23.77ms 171.00µs 23.11ms 23.04ms 23.10ms 23.09ms 165.16MiB 6.25KiB
suites.polybench.fdtd-2d serial 4 10.83ms 10.82ms 8.50µs N/A N/A N/A N/A N/A 1.10MiB
suites.polybench.fdtd-2d cuda 38 3.32ms 3.34ms 20.24µs 2.97ms 2.53ms 193.74µs 239.80µs 160.77MiB 1.10MiB
suites.polybench.fdtd-2d sycl 1982 4.89ms 4.89ms 2.32µs N/A N/A N/A N/A 158.54MiB 1.10MiB
suites.polybench.floyd-warshall serial 4 89.73ms 89.72ms 5.25µs N/A N/A N/A N/A N/A 1.91MiB
suites.polybench.floyd-warshall cuda 25 5.65ms 5.69ms None 5.03ms 4.63ms 192.96µs 194.71µs 159.56MiB 1.91MiB
suites.polybench.floyd-warshall sycl 1821 5.33ms 5.33ms 4.37µs N/A N/A N/A N/A 156.75MiB 1.91MiB
suites.polybench.gemm serial 4 3.20ms 3.12ms 74.25µs N/A N/A N/A N/A N/A 1.00MiB
suites.polybench.gemm cuda 101 1.07ms 1.06ms 26.27µs 751.58µs 570.24µs 149.17µs 23.54µs 155.74MiB 1.00MiB
suites.polybench.gemm sycl 22777 356.47µs 355.73µs 940.33ns N/A N/A N/A N/A 162.10MiB 1.00MiB
suites.polybench.gemver serial 4 398.83µs 399.00µs None N/A N/A N/A N/A N/A 1.25MiB
suites.polybench.gemver cuda 134 1.01ms 1.01ms 1.06µs 776.23µs 167.22µs 273.82µs 325.36µs 157.23MiB 1.25MiB
suites.polybench.gemver sycl 33275 188.31µs 187.35µs 1.12µs N/A N/A N/A N/A 166.21MiB 1.25MiB
suites.polybench.gesummv serial 4 395.29µs 380.00µs 15.25µs N/A N/A N/A N/A N/A 2.45MiB
suites.polybench.gesummv cuda 162 813.37µs 820.04µs None 506.72µs 199.00µs 299.35µs 1.25µs 158.23MiB 2.45MiB
suites.polybench.gesummv sycl 33526 203.32µs 202.05µs 1.48µs N/A N/A N/A N/A 167.05MiB 2.45MiB
suites.polybench.gramschmidt serial 4 9.30ms 9.29ms 11.25µs N/A N/A N/A N/A N/A 1.04MiB
suites.polybench.gramschmidt cuda 13 29.11ms 29.25ms 5.69µs 28.75ms 28.49ms 28.52ms 28.70ms 166.39MiB 1.04MiB
suites.polybench.heat-3d serial 4 18.40ms 18.39ms 6.75µs N/A N/A N/A N/A N/A 1000.00KiB
suites.polybench.heat-3d cuda 37 3.47ms 3.48ms 6.73µs 3.11ms 3.02ms 43.11µs 39.89µs 156.69MiB 1000.00KiB
suites.polybench.heat-3d sycl 83692 60.19µs 58.85µs 1.53µs N/A N/A N/A N/A 190.71MiB 1000.00KiB
suites.polybench.jacobi-1d serial 2419 2.30µs 1.83µs 411.33ns N/A N/A N/A N/A N/A 6.25KiB
suites.polybench.jacobi-1d cuda 1313 259.24µs 259.29µs 2.36µs 19.07µs 6.80µs 1.26µs 1.19µs 156.12MiB 6.25KiB
suites.polybench.jacobi-1d sycl 86087 37.72µs 36.16µs 1.79µs N/A N/A N/A N/A 189.36MiB 6.25KiB
suites.polybench.jacobi-2d serial 4 242.34µs 242.00µs None N/A N/A N/A N/A N/A 976.56KiB
suites.polybench.jacobi-2d cuda 462 447.57µs 437.24µs 16.86µs 102.51µs 15.80µs 42.18µs 38.92µs 157.39MiB 976.56KiB
suites.polybench.jacobi-2d sycl 81824 33.58µs 32.30µs 1.46µs N/A N/A N/A N/A 188.72MiB 976.56KiB
suites.polybench.lu serial 4 51.57ms 51.57ms None N/A N/A N/A N/A N/A 1.22MiB
suites.polybench.lu cuda 49 5.50ms 5.50ms 56.86µs 4.98ms 4.76ms 110.29µs 109.19µs 157.16MiB 1.22MiB
suites.polybench.lu sycl 1061 9.16ms 9.03ms 131.42µs N/A N/A N/A N/A 158.22MiB 1.22MiB
suites.polybench.ludcmp serial 8 1.82ms 1.82ms 4.75µs N/A N/A N/A N/A N/A 317.19KiB
suites.polybench.ludcmp cuda 1 5.87 5.94 None 5.87 5.87 5.87 5.87 169.43MiB 315.62KiB
suites.polybench.mvt serial 4 290.90µs 291.00µs None N/A N/A N/A N/A N/A 1.23MiB
suites.polybench.mvt cuda 236 605.50µs 609.83µs 1.60µs 307.27µs 143.05µs 143.23µs 11.98µs 157.09MiB 1.23MiB
suites.polybench.mvt sycl 35669 129.82µs 123.96µs 6.13µs N/A N/A N/A N/A 171.91MiB 1.23MiB
suites.polybench.nussinov serial 81 303.04µs 302.30µs 851.85ns N/A N/A N/A N/A N/A 113.44KiB
suites.polybench.nussinov cuda 105 1.34ms 1.32ms 24.28µs 1.08ms 1.02ms 34.97µs 9.33µs 156.64MiB 113.44KiB
suites.polybench.nussinov sycl 5990 1.58ms 1.58ms 556.26ns N/A N/A N/A N/A 154.80MiB 113.44KiB
suites.polybench.seidel-2d serial 4 1.65ms 1.65ms None N/A N/A N/A N/A N/A 1.22MiB
suites.polybench.seidel-2d cuda 193 819.67µs 824.19µs 1.26µs 249.59µs 25.84µs 108.96µs 108.83µs 157.35MiB 1.22MiB
suites.polybench.seidel-2d sycl 61457 74.30µs 72.46µs 2.02µs N/A N/A N/A N/A 181.05MiB 1.22MiB
suites.polybench.symm serial 4 15.36ms 15.36ms None N/A N/A N/A N/A N/A 1.11MiB
suites.polybench.symm cuda 87 1.50ms 1.45ms 59.40µs 1.18ms 1.04ms 95.69µs 31.66µs 157.79MiB 1.11MiB
suites.polybench.symm sycl 33095 197.91µs 196.48µs 1.74µs N/A N/A N/A N/A 165.41MiB 1.11MiB
suites.polybench.syr2k serial 4 9.41ms 9.41ms 1.75µs N/A N/A N/A N/A N/A 1.17MiB
suites.polybench.syr2k cuda 109 1.22ms 1.22ms 504.59ns 836.69µs 626.18µs 166.89µs 35.99µs 157.23MiB 1.17MiB
suites.polybench.syr2k sycl 1302 7.46ms 7.45ms 9.33µs N/A N/A N/A N/A 155.72MiB 1.17MiB
suites.polybench.syrk serial 4 4.37ms 4.37ms None N/A N/A N/A N/A N/A 825.00KiB
suites.polybench.syrk cuda 121 1.09ms 1.06ms 40.19µs 732.82µs 589.06µs 102.17µs 35.71µs 157.41MiB 825.00KiB
suites.polybench.syrk sycl 1519 6.38ms 6.38ms 1.46µs N/A N/A N/A N/A 154.95MiB 825.00KiB
suites.polybench.trisolv serial 8 65.20µs 65.25µs None N/A N/A N/A N/A N/A 1.23MiB
suites.polybench.trisolv cuda 10 18.12ms 18.28ms 5.60µs 17.79ms 17.62ms 17.78ms 17.64ms 167.88MiB 1.23MiB
suites.polybench.trmm serial 4 4.95ms 4.94ms 17.00µs N/A N/A N/A N/A N/A 750.00KiB
suites.polybench.trmm cuda 28 4.62ms 4.58ms 40.61µs 4.27ms 4.14ms 94.99µs 30.20µs 157.17MiB 750.00KiB
suites.polybench.trmm sycl 2444 3.95ms 3.95ms 1.79µs N/A N/A N/A N/A 156.80MiB 750.00KiB

Walltime Plot

2023-08-02T03:38:02.244114 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Speedup Plot: cuda

2023-08-02T03:38:07.616687 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Speedup Plot: serial

2023-08-02T03:38:08.152742 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/

Speedup Plot: sycl

2023-08-02T03:38:08.842535 image/svg+xml Matplotlib v3.7.1, https://matplotlib.org/